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Let y — A/3 + e, where y is an N x 1 vector of observations, /9 
is a p X 1 vector of unknown regression coefficients, A is an N x p 
design matrix and £ is a spherically symmetric error term with un- 
known scale parameter a. We consider estimation of j3 under general 
quadratic loss functions, and, in particular, extend the work of Straw- 
derman [J. Amer. Statist. Assoc. 73 (1978) 623-627] and Casella 
[Ann. Statist. 8 (1980) 1036-1056, J. Amer. Statist. Assoc. 80 (1985) 
753-758] by finding adaptive minimax estimators (which are, under 
the normality assumption, also generalized Bayes) of /3, which have 
greater numerical stability (i.e., smaller condition number) than the 
usual least squares estimator. In particular, we give a subclass of such 
estimators which, surprisingly, has a very simple form. We also show 
that under certain conditions the generalized Bayes minimax estima- 
tors in the normal case are also generalized Bayes and minimax in 
the general case of spherically symmetric errors. 

1. Introduction. In this paper we consider adaptive ridge regression esti- 
mators in the general linear model with homogeneous spherically symmetric 
errors. There are three main contributions: (a) we propose sufficient condi- 
tions on estimators for simultaneously reducing risk and increasing numer- 
ical stability relative to the least squares estimator for all full rank design 
matrices, (b) under normality, we obtain a broad class of generalized Bayes 
estimators satisfying the above sufficient conditions, and (c) this class con- 
tain a subclass of particularly simple form, which, we hope, adds to the 
practical utility of our results. 

Hoerl and Kennard [11] introduced the ridge regression technique as a way 
to simultaneously reduce the risk and increase the numerical stability of the 
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least squares estimator in ill-conditional problems. The risk reduction aspect 
of Hoerl and Kennard's method was often observed in simulations but was 
not theoretically justified. Strawderman [19] looked at the problem in the 
context of minimaxity and produced minimax adaptive ridge-type estima- 
tors, but ignored the condition number aspect of the problem. Casella [7, 8] 
considered both the minimaxity and condition number aspects and gave es- 
timators which were minimax and condition number decreasing for some, 
but not all, design matrices. Neither Strawderman nor Casella gave gener- 
alized Bayes minimax estimators. Moreover, to the best of our knowledge, 
almost all theoretical results on ridge regression in the literature depend on 
normality. 

In the present paper we propose a broad class of minimax estimators 
which increases the numerical stability of the least squares estimator for all 
full rank design matrices, under the assumption of a spherically symmetric 
error distribution. Furthermore, under normality a broad class of generalized 
Bayes estimators included in the above class is found. What is particularly 
noteworthy about our class of estimators is that it contains a subclass with 
a form (adapted to the case of unknown cr^) which is remarkably similar to 
that of the estimators originally suggested in [17] for the case Cov(X) = /. 
In particular, our simple generalized Bayes estimators of the mean vector 
are of the form 

^SB = (/ - a/{7(a + 1) + W}C~^)X, 

where W = X'C~^D~^X/S for some positive-definite matrices C and D. 

To be more precise, we start the familiar linear regression model Y = Af3 + 
e, where y is an x 1 vector of observations, A is the known N x p design 
matrix of rank p, (3 is the p x 1 vector of unknown regression coefficients, and 
e is an X 1 vector of experimental errors. We assume e has a spherically 
symmetric distribution with a density f{e'e/a'^), where a is an unknown 
scale parameter and /(•) is a nonnegative function on the nonnegative real 
line. 

The least squares estimator of /3 is /3 = {A'A)~^A'y. Since the covari- 
ance matrix of $ is proportional to a'^{A'A)"^, the least squares estimator 
may not be a suitable estimator when some components of /? or some linear 
combinations of /3 have a very large variance and when A' A is nearly singu- 
lar. Additionally, {A'A)~^ may have inflated diagonal values so that small 
changes in the observations produce large changes in /?. Hoerl and Kennard 
[11] proposed the ridge estimator 

(1.1) Pn{k) = {A'A + kIpy'A'y, 

where A; is a positive constant, to ameliorate these problems. Adding the 
number k before inverting amounts to increasing each eigenvalue of A' A by 
k. 
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In particular, if P is the p x p orthogonal matrix of eigenvectors of {A' A) ^ , 
with (ii > (i2 > • • • > (ip as eigenvalues, it follows that 

P'{A'A)~^P = D, P'P = Ip, 

where D = diag((ii, . . . , dp). Then (1.1) can be written as 

(1.2) pn{k) = P{D~^ + kIp)~^P'Ay. 

The ridge estimator is more stable than /? in the sense that the condition 
number of the estimator is reduced. 

However, we are interested in proposing better estimators than /? from 
the decision-theoretic point of view. We measure the loss in estimating f3 by 
b with loss functions 

(1.3) Lj{b, (3, a^) = a~\b - PYiA'AYib - /?), 

where {A' Ay = Pdiag((i^-', . . . ,d~^)P' . In particular, Lj for j = 0, 1,2 are 
known as squared error loss, predictive (or scale invariant) loss and Straw- 
derman's [19] loss, respectively. Then the risk function of an estimator b 
is given by Rj{b, P,a^) = E[Lj{b, P,a'^)]. The least squares estimator /3 is 
minimax with constant risk. Therefore, 6 is a minimax estimator of /3 if 
and only if Rj{b, f3,a'^) < Rj {f3 , 13 , a'^) for all f3 and a"^. Hence, the search for 
estimators better than /? is a search for minimax estimators. 

To simplify expressions and to make matters a bit clearer, it is helpful to 
rotate the problem via the following transformation, so that the covariance 
matrix of f3 becomes diagonal. Let Q be an x orthogonal matrix such 
that 





QA: 



and let be the N x N diagonal matrix diag((ii, . . . , dp, 1, . . . , 1). Next 
define two random vectors X = [Xi, . . . , Xp)' and Z = {Zi, . . . , Z^)' , where 
n = N — p, by 

^^-dI/'qy. 



z 

Then (X' , Z')' has the joint density given by 

(1.4) l[d;^/^a~P~''f{{{x - eyD"\x -e) + z'z}/a^), 

where 6 = P' (3. Notice also that X and Z' Z can be expressed as P (3 and 
{y — A(3)'{y — A/3), respectively. Denote Z' Z by S, as is customary. The 
original problem is thus equivalent to estimation of 9 under the loss function 
Lj{6,9,a'^) = ((5 — 6)'D~^{5 — 0)/a'^, where j corresponds to j in (1.3). We 
will consider the problem in this equivalent canonical form. Note that Lq 
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is, in a sense, the least favorable among Lj for j >0 under multicoUinearity 
because Lj for j > relatively reduces the contribution of components with 
large variance. 

Strawderman [19] and Casella [7] essentially considered the class of esti- 
mators of the form 

en{K) = {I-{I + D-'K-'}-')X, 

which originally came from straight generalization of (1.2), that is, the gen- 
eralized ridge estimator 

(1.5) Pn{K) = (A' A + PKP'y^A'y = P{D~^ + K^^P'A'y, 

where K = diag{ki, ... ,kp). Under general quadratic loss they proposed a 
sufficient condition for minimaxity under normality for adaptive estima- 
tors 6^^{K), where K = ipi^X' X / S) diag{ai, ap) , is a suitable pos- 
itive function and Oj is positive for all i. Casella [7] discussed the relation- 
ship between minimaxity and stability (in terms of lowered condition num- 
ber) and pointed out that forcing ridge regression estimators to be minimax 
makes it difficult for them to provide the numerical stability for which they 
were originally intended. Casella [8] found that, under certain conditions 
on the structure of the eigenvalues of the design matrix, both minimaxity 
under Lq and stability can be simultaneously achieved for a special case 
tp(w) = w"^. 

In Section 2, for the general spherically symmetric case, we give a class 
of minimax estimators of (and hence, by transformation, (5) under Lj, 
somewhat broader than those of Strawderman [19] and Casella [7, 8]. We 
then give a class of generalized hierarchical prior distributions on 9 and cr^ 
which, in the normal case, give generalized Bayes estimators satisfying the 
minimaxity condition. This class generalizes (also to the class of unknown 
a^) the class of priors in [3, 4, 10, 14, 18]. We further show that, for cer- 
tain choices of parameters in the hierarchy, the resulting estimators have the 
simple form indicated above. We also show that in certain cases a version of 
our minimax estimator is generalized Bayes for the entire class of spherically 
symmetric error distributions. Section 3 is devoted to the study of general 
conditions under which the generalized ridge regression estimator Pji{K) 
competitive with /? has increased numerical stability (i.e., decreased condi- 
tion number). Section 4 is devoted to showing that we may always choose a 
minimax estimator (which is also generalized Bayes under normality) in our 
class which has greater numerical stability than the least squares estimator. 
In particular, our simple generalized Bayes minimax stable estimators under 
normality are quite practical for the general spherically symmetric case. In 
Section 5 we give some numerical results. 
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2. A class of minimax generalized Bayes estimators. In this section we 
first give a sufficient condition for minimaxity under the loss Lj and the 
spherically symmetric case, and then use it to obtain a class of generalized 
Bayes minimax estimators under the normal case. This class contains a 
subclass of a particularly simple form, which we hope adds to the practical 
utility of our results. We also show that in certain cases a version of our 
minimax estimator is generalized Bayes for the entire class of spherically 
symmetric error distributions. 

Our estimators are of the form 

where C = diag(ci, . . . , Cp), where Cj > 1 for any i. We note that estimators of 
the form (2.1) satisfy "directional consistency," a weak necessary condition 
for admissibility discussed in [5]. 

First we give a sufficient condition for minimaxity. 

Theorem 2.1. Suppose {X\Z')' has a distribution given by (1.4). Then 
6(1) given by (2.1) is minimax under Lj if (l)'{w) > and 

< cPH < 2{n + 2)-i ( ^^fZij\ - 2) • 
Vmaxjaj /Ctj / 

Proof. See the Appendix. 

Next we develop a class of generalized Bayes estimators under normal- 
ity. Suppose the distribution of {X',Z'y is normal with covariance matrix 
o"^ diag((ii, . . . , dp, 1, . . . , 1) and mean vector {9', 0')'. Consider the following 
generalized prior distribution: 

9\X,r] ^ N„{0,ri~^D{\~^C - I)) for ?? = (j-2, 

(2.2) 

AocA'^(l-7A)%,i/^] for 7>l,r/oc7?^ 

This is a generalization of priors considered in [3, 4, 10, 14, 18]. The marginal 
density of X, S, A and r/ is proportional to 

(2.3) xl[{ci-X)-'/\l-jX)''de 

oc exp(^-^(l + A^^)) rf/^+n/2+e^p/2+a^^ _ ^^^b^ 
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where w = x'C^^D^^x/s. Under the loss Lj, the generaUzed Bayes estimator 
is given by E{ri6\X,S)/E{r]\X,S), which can be written, using (2.3), 



E{ri\X,S) J \ W 
When p/2 + n/2 + e + 2 > 0, 

(2.4) 

cx(l + Au;)-^'/2""/2-e-2^ 

and we have 

W Jo tf/2+'^+l(l - tf{l + wt/j)-P/^-''/^^^^^ dt 



(2.5) (t)GB{w) 



7 /o tP/2+'^(l - t)\l + u;t/7)-P/2-n/2-e-2 



which is well defined for a > — p/2 — 1 and 6 > — 1. Using an identity which 
is given by the change of variables t = {1 + w)\/ {1 + wX) ^ 

nl 

A"(l - A)^(l + u;A)-^ d\ 

1 ..^r. tw ^-"-/3+7-2 



r(l-t)'^ 1-— — ^ dt 



{w + 1)°+^ 7o I w + l, 

we have 

t/; J^tP/^+''+^{l-t)''{l-tw/iw + j)}''/^+^-''-^-^dt 

(2.6) (^Gb(w^)-^^^ /o^tP/2+«(l-t)6{l-tw;/(t(; + 7)}V2+e-a-6^i 

The following lemma gives some useful properties of (j)QB{w). 

Lemma 2.2. // 6 > 0, e > -p/2 - n/2 - 2 and -p/2 - 1 < a < n/2 + e, 
we /iaue for (t){w) =(j)Q,B{w) given by (2.6): 

(i) (^{w) is monotone increasing in w. 

(ii) (j){w)/w is monotone decreasing in w. 

(iii) lim^^oo 4>iw) = (p/2 + a + l)/(n/2 + e- a). 

(iv) lim^^o{</'(«^)/^i'} = (p/2 + a + l)/{7(p/2 + a + 6 + 2)}. 

Proof. The proof of (i) and (ii) is straightforward using monotone like- 
lihood ratio properties of the densities implied in (2.5) and (2.6). The proof 
of (iii) and (iv) follows from (2.6) and (2.5), respectively. □ 

By Lemma 2.2, parts (i) and (iii), and Theorem 2.1, we have immediately 
the following result. 
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Theorem 2.3. Ifb>0, e > -p/2-n/2-2 and -p/2-1 < a<n/2 + e, 
then 9qb is minimax under Lj, provided ci, . . . ,Cp are chosen so that 

^^ p/2 + a + l ^ 2 ( EK'"Vca 



n/2 + e-a - n + 2 Vmax{(i,^~-' /cj 

Note if we choose Cj = di/dp under Lq, the bound on the RHS is 2{p — 
2) /(n + 2). The choices of a = —2 and e = — 1 give a value of {p — 2)/ (n + 2) 
for the LHS and, hence, for p>3 and n > 1, these choices of a and e give 
minimax generahzed Bayes estimators for any 6 > and 7 > 1. As Caseha 
[7, 8] indicated, this choice of Cj may be poor from the point of view of the 
numeric stabihty of the estimator. It is important to note at this stage that 
there is substantial flexibility in the choice of C and this flexibility is the 
key to finding minimax estimators with increased numerical stability. We 
consider this point further in Sections 3 and 4. 

2.1. A class of simple generalized Bayes minimax estimators. When b = 
n/2 — a + e—1 in (2.6), the expression for (j)Q^{w) takes a particularly simple 
form. In this case, 

w 

(I>gb{w) = -^B{p/2 + a + 2, 6 + 1) 
X {B(p/2 + a + 1,6+1) 

(2.7) 

- {w/{w + -i)}B{p/2 + a + 2, 6 + 1)}"^ 

= ~( — HT"^ — [= "^^SB {w) , say] , 
7(0; + \) + w 

where a = {p/2 + a + l)/(6 + 1) = {p/2 + a + I) /{n/2 + e-a). 
Therefore, our simple generalized Bayes estimator is 

(2.8) ^SB =(l- , ^• 

V 7(a + l) + l^ / 

Since (/'sb(w^) is increasing in w and approaches a as w ^ 00, we have the 
following corollary which follows immediately from Theorem 2.1. 

Corollary 2.4. 0sb given by (2.8) is minimax under Lj, provided ci, . . . ,Cp 
are chosen so that 



0<a< 



n + 2 Vmax{4~7ci} 



In Section 4 we will show that we can always choose a, 7 and ci,. . . ,Cp to 
simultaneously achieve minimaxity and an increase in the numerical stability 
of the least squares estimator. 
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It is interesting to note that, when C = D = Ip, our simple estimator has 
the form 

1 " Vy 

j{a + l)+X'X/Sj • 

This is very closely related to Stein's [17] initial class of estimators. He 
suggested that, for X ^ N{9,Ip) with p>3, there exist estimators dom- 
inating the usual estimator X among a class of estimators of the form 
Sa,b = (1 ~ ^/(o + X'X))X for large a and small b. Hence, our estimators 
may be regarded as a variant for the unknown variance case. 

Following Stein [17], James and Stein [12] showed that 6a,b for a = and 
< 6 < 2{p — 2) dominates X. Since Strawderman [18] derived Bayes mini- 
max estimators, many authors have proposed various minimax (generalized) 
Bayes estimators. However, the form of these estimators is invariably com- 
plicated like our expression (2.6) above. Simple estimators 6a,b have received 
little attention although for a > and <b < 2{p — 2), is easily shown 
to be minimax by using Baranchik's [1] condition. It seems that most statis- 
ticians have believed that generalized Bayes estimators which improve on X 
must have a quite complicated structure. Our result above indicates that 
this is not so and that generalized Bayes minimax estimators improving on 
X may indeed have a very simple form. 

2.2. Generalized Bayes estimators for spherically symmetric distributions. 
It seems useful to show that the above generalized Bayes results can be ex- 
tended to the general spherically symmetric case (1.4) in certain situations. 
What is remarkable about the results is that the resulting generalized Bayes 
estimators are independent of the form of /(■) and are, hence, identical 
to those in the normal case. In particular, assume that C = I, 7 = 1 and 
b = —a — 2 in the prior given by (2.2). Then the joint density of 9 and rj is 



j^^iQyp/2~a~i^~a-i+e because 
exp( 





(2.9) 

Oc(0'i?~l^)-P/2-a-l^-a-l^ 

ifp/2 + a+ l>0. Under quadratic loss r]{d — 9)'{d — 9), the generalized Bayes 
estimator is given by E{r]9\X,S)/E{r]\X,S) and we have the generalized 
Bayes estimator, with respect to our prior, 

Irp ^r7("+P)/^-"+'^/(r?{X^£)-^X + S})i9'D-^9)-P/^-''-^ dr]d9 
Ibp iir ??("+P)/2-a+e /(^{x'L'-iX + S}){9'D-^9)-p/^-^-^ di]d9 

= I 9{X'D-^X + S)-^"+P'^/^-^''-^~\9'D-^9yp/^-''~U9 

JRP 
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if rj^'^~^P^/'^~°'^^ f{r]) drj < oo. Note that this does not depend on / and, 
hence, is equal to the generalized Bayes estimator in the normal case. In 
the normal case, as seen in Section 2.1, the estimator is well defined if a > 
—p/2 — 1, 6 > — 1 and e > —p/2 — n/2 — 2. Since a = —b — 2, the inequality 
—p/2 — 1 < a < — 1 should also be satisfied. 

If b = n/2 — a + e — 1, which implies e = —n/2 — 1, we have a simple 
generalized Bayes estimator 

esB = {l-a/{a + l + W))X, 

wherea= (p/2 + a + l)/(-a-l) andW = X' D"^X/ S . Note that a = (p/2 + 
a + l)/(— a — 1) can take any positive value because —p/2 — 1 < a < —1. 

Remark. The most important point is that, when a = —b — 2, 9 and rj 
are able to be separated as in (2.9). Furthermore, if 7 = 1, C = I and a = 
—b — 2 are simultaneously not satisfied, the density cannot be so separated. 
The results in this section are closely related to those in [15]. 

3. Condition numbers and numerical stability. As in Casella [8] and 
other papers, we use the condition number to measure numerical stabil- 
ity of our ridge-type estimators. This discussion focuses on the stability of 
estimators of (3 (as opposed to estimators of 6). Recall that our estima- 
tors of 6 may be represented as 9^ = {I — tC~^)X, where t = 4>{w)/w and 
w = x'C~^D~^x/ s. The vector of regression parameters, /3, is related to the 
mean vector 9 through the orthogonal matrix P (9 = P'P), and the obser- 
vation vector X in Section 2 is related to the least squares estimator, /?, 
through X = P (3. In this section we study the numerical stability of ridge- 
type estimators of arising from our improved estimators 9(f, of 9 through 

= P9^ = P(diag{dri(l _ t / Ci)~^})-^ P' A' y 
(3.1) ^ , 

By (3.1) (3cf) may be regarded as a generalized ridge regression estimator 
j3^{K) given by (1.5) when we put ki = t/{di{ci — t)}. 

The condition number of a matrix H is defined by k,{H) = \\H\\\\H~^\\, 
where \\H\\ = sup^/^^i{x'H'Hxy^'^ = maxAj, where Aj are the eigenvalues 
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of the positive-definite matrix H'H. It follows that if is a positive-definite 
matrix, k{H) = k{H~^). As indicated in [8] (see also [2]), the condition num- 
ber measures the numerical sensitivity of the solution of a linear equation 
P = H~^A'y. In particular, if 5j3 and 5{A'y) indicate perturbations in /3 and 
A'y, respectively, 

mm<n{H){\5A'y\/\^y\), 

where | ■ | denotes the usual Euclidean norm. For simplicity of notation, we 
define the condition number of an estimator of the form (3.1) ^(/S^) to be 
equal to the condition number of the matrix , k{G~^) = k(G), that is. 

It follows immediately from the definition of k{G) that (we assume t < 1, 

Ci>l) 

(3.2) k(/3) = di/dp 

and 

,A N maxdiil — t/a) 

3.3 k{p^)= . :\ ' \ . 

mmaj(l — tjci) 

In terms of numerical stability, a smaller condition number implies greater 
stability. Of course, the condition number given in (3.3) depends on t = 
4>(w)lw and, in particular, when t = 0, (3.3) reduces to (3.2). We will be 
interested in finding conditions on the estimator so that, for all possible 

values of zu, we have the inequality «:(/3(^) < k(/3). 

The following result allows condition number improving estimators under 
two different conditions on ci, . . . , Cp. 

Theorem 3.1. Suppose < (f){w)/w <tQ <l for any w. Then k0^) < 
k{P) for any w if either: 

(i) ci <C2 < ■ ■ ■ <Cp and 

lo A\ + ^ ■ f CiCj{didj -didp)\ 

(3.4) to < mm — '——^ , 

«>j V Cididj — Cjdidp ) 



or 



(ii) Cp > ci > C2 > • • • > Cp_i and 

('\^\ ^ ^^;„^ CiCp„i(dp-i-dp) Cp„iCp((iirfp,i - d^) 

[i.'o) to < mm — , — — -"2- 

V cidp^i — Cp-idp Cpdidp-i — Cp-idp 

Proof. See the Appendix. 

In the next section we will see that the two conditions above allow us 
to choose minimax generalized Bayes estimators with increased numerical 
stability for all full rank designs in the normal case. 
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4. Minimaxity and stability. In this section we show that the results of 
the previous two sections can be combined to give minimax estimators which 
simultaneously reduce the condition number relative to the least squares 
estimator. Then we give a corollary for the simple generalized Bayes esti- 
mator ^sB given by (2.8) under the normal case, because it seems to have 
practical utility for the general spherically symmetric case. Finally, we add 
some comments for the case of more general quadratic loss than Lj given 
by (1.3). 

Note that it seems generally desirable to have ci < ■ ■ ■ < Cp since this 
implies that the components of X with larger variances get shrunk more. 
See [8] for an expanded discussion of this point. 

Our first result below shows that we may find a minimax condition number 
improving estimator satisfying ci < ■ ■ ■ < Cp whenever J2{di/di}^~^ — 2 > 0. 
Note that, when i > 1, J2{di/di}^~^ — 2 is always positive. 

Theorem 4.1. Suppose p > 3 and J^idi/di}^'^ - 2 > 0. If di>d2, let 
?7* be the unique root such that J2{di/diy' = 2 and let rj^,^ he any value in 
(max{0, 1 — j}, r]*) . // di = d2, let rj^^ be any value > max(l — j, 0) . Then if 

ci = idi/d,y-^+^", 



and 



(4.1) = min 



u+ = 2{n + 2)-i (J2{di/dir" - 2 
CiCj{didj — didj 



i>j \ Cididj — Cjdid, 



V 



the estimator O^j) where < (f){w)/w < u+ for any w, 4>{w) is increasing and 
lim^^oo^(w^) < is minimax under Lj and condition number decreasing, 
further ci < • • • < Cp . 

Proof. Since {di/diY' is strictly decreasing in rj if d-i/di < 1, there exists 
exactly one root rj^ oi J2idi / di)"^ = 2 if d2/di < 1, and that root is strictly 
larger than I — j- If di = d2, J2idi/di)^ > 2 for any r/ > 0. Hence, t?^,* > 
1 — j and Cj = {di/diy~^~^^** is monotone nondecreasing in i. Also from 
Theorem 2.1 we have minimaxity, provided 



< (j){w) < 



E{dl~'/c.} 



n + 2\ma.-K{d]-^ /a} 

2 



n + 2 



iJ2{d^/dlV"-2)=u+ (>0). 



Also by Theorem 3.1(i), since ci < C2 < • • • < Cp, the estimator O^f^ will have 
reduced condition number, provided < (j){w)/w < w+ for any w. □ 
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From Theorem 4.1 we easily see the robustness of minimaxity with respect 
to loss function. 

Corollary 4.2. A minimax estimator under Lj for fixed j , which is 
given by Theorem 4.1, retains minimaxity under for j < k < j + 77*^,. 

For example, suppose X^l^i/'^i}^ — 2 > and Lq is used. In Theorem 4.1, 
we can choose r^^* as strictly greater value than 2. Hence, a minimax esti- 
mator using such r/^,* under Lq retains minimaxity under Li and L2. 

There remains the case where J2{di / di}^'^ - 2 < 0. Recah that El^i/c^i}^"^' - 
2 is always positive for j > 1. This case corresponds to the case where there 
is no spherically symmetric estimator (ci = C2 = • • • = Cp) and, therefore, no 
estimator with ci < C2 < ■ ■ ■ < Cp can be minimax (e.g., see [6]). Our solution, 
while less pleasing in a sense than Theorem 4.1, nevertheless, allows a min- 
imax estimator which reduces the condition number and, hence, increases 
the stability. 

Theorem 4.3. Suppose p>3 and El^^iM}^"^ - 2 < 0. If p>4:, let 
G (0,1 — j) he the unique solution of Yl^iZi{di/ diY = 2. Let u^^, he any 

value in [0,z/*). If p = 3, choose z^** = 0. Then if Ci = {di/ dp-iY~^~'^** for 

i = 1, 2, . . . ,p — 1 and Cp> ci, 

„_=2(„ + 2)-gw/4r--2+|{|}'"') 

and 

V — minf '^I'^P-i^'^P-i ~ Cp-iCp{didp_i - 

V cidp-i - Cp-idp ' Cpdidp^i - Cp^idj /' 

the estimator 9^ where < (f){w)/w < V- for any w, (j){w) is increasing and 
lim^_»oo '^('U^) < "U-, is minimax under Lj and condition numher decreasing. 

Proof. It is easy to see, as in Theorem 4.1, that be chosen 

as indicated. In this case. Theorem 2.1 implies minimaxity, provided 

Theorem 3.1(ii) then implies, since Cp > ci > C2 > • • • > Cp_i, that our esti- 
mator is condition improving if < (j){w)/w < V- for any w. □ 
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Combining Lemma 2.2 and Theorems 4.1 and 4.3, we see that versions 
of Theorems 4.1 and 4.3 are valid for the broad class of generalized Bayes 
minimax estimators of Theorem 2.3. We omit the straightforward details. We 
give explicitly a corollary of Theorems 4.1 and 4.3 for our simple generalized 
Bayes estimator ^SB, because this seems to have practical utility in the 
general spherically symmetric case. 

Corollary 4.4. ^sb = {I — a/{7(a + 1) + W}C~^)X is minimax un- 
der Lj and condition number decreasing ( and generalized Bayes under nor- 
mality) if either: 

(i) under the setting of Theorem 4.1, a < m+ and 7 > a/{{a-\- l)v^}, or 

(ii) under the setting of Theorem 4.3, a < n_ and 7 > a/{{a + l)f_}. 

Hence, in the normal case, for any full rank design, we may choose a simple 
generalized Bayes minimax estimator with increased numerical stability over 
the least squares estimator /?. Further, these estimators remain minimax for 
all spherically symmetric error distributions. 

Finally, we briefly consider the case for general quadratic loss functions 
Lq = a~'^{h — pyQ{b — j3) for a positive definite matrix Q. Recall that we 
have assumed Q = [A' A)^ throughout the paper. It is essential for the deriva- 
tion of minimax estimators with numerical stability in Theorems 4.1 and 4.3 
that A' A and {A' Ay have common eigenvectors. For a general Q which does 
not have common eigenvectors with A' A, let M be a nonsingular matrix 
which simultaneously diagonalizes A' A and Q, where M satisfies 

M{A'A)-^M' = G = diag(5i, • • • MM' = Q. 

Let X = M'(5 and 6 = M'(5. As in Section 1, we see that {X' , Z')' has the joint 
density ffj~^^^cr~^/({(2; - Q)'G-^{x - 61) + z'zj/cr^) and that the estimation 
problem of Q under the squared loss function [b — 9y{6 — 9) is derived as 
the equivalent canonical problem. Hence, we can have the same minimaxity 
result in Theorem 2.1 for the shrinkage estimator of the form (2.1) if gi is 
substituted for di. The corresponding generalized ridge estimator becomes 

{A'A + MKM')-^A'y, 

where K = diag(A:i, . . . , kp) for ki = tj {gi{ci — t)}. The eigenvalues of A' A + 
MKM' (and hence, the condition number), however, cannot be expressed 
explicitly while, in Section 3, the eigenvalues of A' A + PKP' and the con- 
dition number can. As a result, we cannot explicitly construct minimax 
estimators with numerical stability as in Theorems 4.1 and 4.3. 
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5. Numerical results. In this section we investigate numerically the risk- 
performance and condition number-performance of our simple Bayes esti- 
mator ^sB) given by Corollary 4.4 under Lj for j = 0, 1,2. In the setting of 
Theorem 4.1, r/** = (max{0, 1 — j} + f?*)/2, a = n_|_ and 7 = a/ {{a + l)f+} 
are chosen. In the setting of Theorem 4.3, z^** = z^*/2, q = u„ and 7 = a/{(a + l)^'-} 
are chosen. Simulation experiments are done in the following case: p = 9, 
n = 10, D = diag(/x^,//^,//^,//, 1, ;U~^, //~^, where fj, = 1.2, 1.6, 2.0, 2.4 
and 6i = 0,0.5,1,1.5,2 for any i. The corresponding condition numbers of 
D, fi^ (and hence, the condition numbers of (3 in the original problem), are 
approximately 4.3, 43, 256 and 1100, respectively. For only two cases, fi = 2 
and 2.4 under Lq, Theorem 4.3 is applied. 

Table 1 shows the relative performance of our simple estimator with re- 
spect to risk and expected condition number (ECN), that is: 

. R{e,esB)/R{0,x), 

• (expected condition number of ^sb)//^^, 

from 50,000 replications, in each of the above cases. We can draw the fol- 
lowing conclusions: 

(i) When /i is large and J2{di/di} — 2 is nonpositive, minimax stable 
estimators using Theorem 4.3 under Lq have little gain both in the risk 
improvement and in the ECN improvement. From the numerical results, 
our contribution of Theorem 4.3 may be just theoretical. 

(ii) Under Li, minimax stable estimators have reasonable performances 
of risk and the ECN, regardless of /i. 

(iii) Under L2, when is large, there is little to gain in risk improve- 
ment, while there is much to gain in ECN improvement. With better choices 
of r/**, ot and 7, however, we may have more reasonable performances of risk 
and ECN. 

Appendix Proof of Theorem 2.1. Let 

POD 

F{x) = \ f{t)dt 

J X 

and define 

[/.(X, Z)] = / / Mx, z)a-^ n d-'"f + ^) 

and 

E^[h{X,Z)]=J J h{x, n ^ {^-OyD'~Hx-e) ^z^\^ 
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Table 1 

Relative performance of our simple estimators under Lj for 
j = 0,l,2 



fj, 9i Lo L± L2 









risk 


ECN 


risk 


ECN 


risk 


ECN 


1.2 










78 


0.761 


0.785 


0.643 


0.674 


0.495 







5 





809 


0.791 


0.808 


0.669 


0.703 


0.511 




1 







866 


0.852 


0.857 


0.734 


0.769 


0.565 




1 


5 





912 


0.902 


0.903 


0.806 


0.837 


0.641 




2 







941 


0.935 


0.934 


0.861 


0.889 


0.718 


1.6 










894 


0.95 


0.778 


0.597 


0.955 


0.417 







5 





917 


0.963 


0.801 


0.637 


0.956 


0.425 




1 







95 


0.981 


0.848 


0.723 


0.961 


0.449 




1 


5 





967 


0.989 


0.891 


0.803 


0.967 


0.487 




2 







978 


0.996 


0.923 


0.86 


0.973 


0.537 


2 










994 


0.999 


0.778 


0.594 


0.995 


0.415 







5 





994 


0.999 


0.807 


0.652 


0.995 


0.419 




1 







995 


0.999 


0.861 


0.757 


0.995 


0.43 




1 


5 





995 


1 


0.905 


0.839 


0.996 


0.449 




2 







995 


1 


0.934 


0.89 


0.996 


0.475 


2.4 










994 


1 


0.778 


0.593 


0.998 


0.422 







5 





994 


1 


0.819 


0.679 


0.999 


0.424 




1 







994 


1 


0.882 


0.802 


1 


0.431 




1 


5 





994 


1 


0.925 


0.879 


1 


0.442 




2 







994 


1 


0.95 


0.921 


1 


0.456 



where h{x, z) is an integrable function. The identities corresponding to the 
Stein and chi-square identities for the normal distribution, 

(A.l) Ei[{X, - e.i)h{X, Z)] = dia^E^[{d/dXi)h{X, Z)], 

(A.2) Ef[Sg{S)\ = a^E^E[ng{S) + 2Sg'{S)l 

where 5 = Z' Z, are useful in our following proof. We use the version derived 
in [13], but earlier versions appear in [16] and elsewhere. 
The risk of Os is given by 



,0- 



(A.3) 



Ef[{e^-e)'D~H 

-R{e,a^,X)+Ef 
-2Ef 



k-0)/a^] 



^^^^^ 



j:{xf/{cA)r 

s 
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Let W = X'C~^D~^X/S. For the second term in (A. 3), using (A. 2), we have 
X'C~^D"^X S ( „ ,-,/X'C-^D-^X' 



XX'c-^D-^xy 



X'C~^D~^X 

x'c-w-^x 



(n + 2) 



5 
W 



For the third term in (A. 3), using (A.l), we have 



.Cjdia 



S 



E' 



Ci W 

Hence, since (p'{w) > 0, we have 



j^ dl'^HW) ^ ^ X'C-^D-^X \ ct^'{W) m)\ 



s 



I w 



^2 J 



<Rj{e,a'^,X)+E^ 



d}r^ X'C-^D'^X 



<Rj{0,a^,X)+E^ 



(t){W) X'C-'^D-^X 
W X'C-^D-^X 

^{W) x'c-^x 

W X'C-^D-^X 

X ((n + 2)(^(Ty)-2^ii^^+4 
I max{(i. ^ /ci] 

<Rj{e,a^,X). 
Proof of Theorem 3.1. If ci < C2 < • • • < Cp, we have 

for i < j 



+ 4 



□ 



di{l - t/a) ^ di 



and 



Hence, if 



max 



dj{\ — t/cj) dj 
di{l - t/ci) di{l - to/ci) 



< 



t dj{l-t/cj) dj{l-to/cj) 



for i > j. 



max 



di{l-to/ci) \ 



i>j \dj{l - to/Cj) 
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or, equivalently, 

' CiCj{didj — didp) 
i>j V Cididj — Cjdidp 

we have 



to < niin 



maxidi(l - t/ci) di 

max — ; — - < — , 

t miiij dj{l —t/cj) dp 

which proves part (i). 

Suppose Cp > ci > C2 > ■ ■ ■ > Cp_i. Then di{l — t/ci) > • • • > 
t/cp_i) and so 

maxi=i,...,p„i(ii(l -t/cj) (ii(l-to/ci) 
nicLX ■ — — — 



Also, 



and 



di{l-t/ci) di 

max — — — - < — 

t dp{l-t/cp) dp 



dp{l-t/cp) ^ dp{l - tp/cp) 

illclX — _^ — — . 

t dp-i{l -t/cp-i) dp_i(l - to/cp_i) 



Hence, if 



f di(l-to/ci) dpil-tp/cp) \ di 
nicix I ~ ' — T" , ~ ' T" I ^ — 

\dp^i{l -to/Cp) dp„i(l - to/cp-i)/ dp 

or, equivalently, 

ciCp-i{dp-i - dp) Cp-iCp{didp-i - dp) 



to < min 
we have 



cidp-i — Cp-idp ' Cpdidp^i — Cp^id^ 



maxi di(l — t / Ci) di 

max — — - < — , 

t miuj dj{l — t/cj) dp 



which proves part (ii). □ 
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